{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ur8xi4C7S06n"
      },
      "outputs": [],
      "source": [
        "# Copyright 2020 Google LLC\n",
        "#\n",
        "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "#     https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1c955a52adac"
      },
      "source": [
        "# Feedback or issues?\n",
        "For any feedback or questions, please open an [issue](https://github.com/googleapis/python-aiplatform/issues)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "eHLV0D7Y5jtU"
      },
      "source": [
        "# Vertex SDK for Python: Custom Training and AutoML Tabular Training Example\n",
        "\n",
        "To use this Colaboratory notebook, you copy the notebook to your own Google Drive and open it with Colaboratory (or Colab). You can run each step, or cell, and see its results. To run a cell, use Shift+Enter. Colab automatically displays the return value of the last line in each cell. For more information about running notebooks in Colab, see the [Colab welcome page](https://colab.research.google.com/notebooks/welcome.ipynb).\n",
        "\n",
        "This notebook demonstrate how to create both an AutoML model and a custom model based on a tabular dataset. It will require you provide a bucket where the dataset will be stored.\n",
        "\n",
        "Note: you may incur charges for training, prediction,  storage or usage of other GCP products in connection with testing this SDK."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "lld3eeJUs5yM"
      },
      "source": [
        "# Install Vertex SDK for Python, Authenticate, and upload of a Dataset to your GCS bucket\n",
        "\n",
        "\n",
        "After the SDK installation the kernel will be automatically restarted. You may see this error message `Your session crashed for an unknown reason` which is normal."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "sBfZtR4X1Dr_"
      },
      "outputs": [],
      "source": [
        "!pip3 uninstall -y google-cloud-aiplatform\n",
        "!pip3 install google-cloud-aiplatform\n",
        "import IPython\n",
        "\n",
        "app = IPython.Application.instance()\n",
        "app.kernel.do_shutdown(True)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "0117fe65129c"
      },
      "outputs": [],
      "source": [
        "import sys\n",
        "\n",
        "if \"google.colab\" in sys.modules:\n",
        "    import os\n",
        "\n",
        "    from google.colab import auth\n",
        "\n",
        "    auth.authenticate_user()\n",
        "    os.environ[\"GOOGLE_CLOUD_PROJECT\"] = MY_PROJECT"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "c0SNmTBeD2nV"
      },
      "source": [
        "### Enter your project and GCS bucket\n",
        "\n",
        "Enter your Project Id in the cell below. Then run the cell to make sure the Cloud SDK uses the right project for all the commands in this notebook."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "iqSQT6Z6bekX"
      },
      "outputs": [],
      "source": [
        "MY_PROJECT = \"YOUR PROJECT\"\n",
        "MY_STAGING_BUCKET = \"gs://YOUR BUCKET\"  # bucket should be in same region as Vertex AI"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "zaQRsZ7vwkmE"
      },
      "source": [
        "The dataset we are using is the California Housing Dataset, available locally in Colab. For more information about this dataset and its license, please visit: https://developers.google.com/machine-learning/crash-course/california-housing-data-description"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "V_T10yTTqcS_"
      },
      "outputs": [],
      "source": [
        "!gcloud config set project {MY_PROJECT}\n",
        "\n",
        "TRAIN_FILE_NAME = \"california_housing_train.csv\"\n",
        "!gsutil cp sample_data/{TRAIN_FILE_NAME} {MY_STAGING_BUCKET}/data/\n",
        "\n",
        "gcs_csv_path = f\"{MY_STAGING_BUCKET}/data/{TRAIN_FILE_NAME}\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "rk43VP_IqcTE"
      },
      "source": [
        "# Initialize Vertex SDK for Python\n",
        "\n",
        "Initialize the *client* for Vertex AI"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "VCiC9gBWqcTF"
      },
      "outputs": [],
      "source": [
        "from google.cloud import aiplatform\n",
        "\n",
        "aiplatform.init(project=MY_PROJECT, staging_bucket=MY_STAGING_BUCKET)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "35QVNhACqcTJ"
      },
      "source": [
        "# Create Managed Tabular Dataset from CSV"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "4OfCqaYRqcTJ"
      },
      "outputs": [],
      "source": [
        "ds = aiplatform.TabularDataset.create(\n",
        "    display_name=\"housing\", gcs_source=[gcs_csv_path], sync=False\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Iw7wMxqNOBL6"
      },
      "source": [
        "# Write your Training Script\n",
        "- Write this cell as a file which will be used for custom training."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "hXJDoIkaOTKu"
      },
      "outputs": [],
      "source": [
        "%%writefile training_script.py\n",
        "\n",
        "import pandas as pd\n",
        "import os\n",
        "import tensorflow as tf\n",
        "from tensorflow import keras\n",
        "from tensorflow.keras import layers\n",
        "\n",
        "\n",
        "# uncomment and bump up replica_count for distributed training\n",
        "# strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()\n",
        "# tf.distribute.experimental_set_strategy(strategy)\n",
        "\n",
        "target = 'median_house_value'\n",
        "\n",
        "def aip_data_to_dataframe(wild_card_path):\n",
        "    return pd.concat([pd.read_csv(fp.numpy().decode())\n",
        "                      for fp in tf.data.Dataset.list_files([wild_card_path])])\n",
        "\n",
        "def get_features_and_labels(df):\n",
        "    features = df.drop(target, axis=1) \n",
        "    return {key: features[key].values for key in features.columns}, df[target].values\n",
        "\n",
        "def data_prep(wild_card_path):\n",
        "    return get_features_and_labels(aip_data_to_dataframe(wild_card_path))\n",
        "\n",
        "train_features, train_labels = data_prep(os.environ[\"AIP_TRAINING_DATA_URI\"])\n",
        "\n",
        "feature_columns = [tf.feature_column.numeric_column(name) for name in \n",
        "                   train_features.keys()]\n",
        "\n",
        "model = tf.keras.Sequential([\n",
        "            layers.DenseFeatures(feature_columns),\n",
        "            layers.Dense(64),\n",
        "            layers.Dense(1)])\n",
        "model.compile(loss='mse', optimizer='adam')\n",
        "\n",
        "model.fit(train_features, train_labels,\n",
        "          epochs=10,\n",
        "          validation_data=data_prep(os.environ[\"AIP_VALIDATION_DATA_URI\"]))\n",
        "print(model.evaluate(*data_prep(os.environ[\"AIP_TEST_DATA_URI\"])))\n",
        "\n",
        "# save as Vertex AI Managed model\n",
        "tf.saved_model.save(model, os.environ[\"AIP_MODEL_DIR\"])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6-bBqipfqcTS"
      },
      "source": [
        "# Launch Training AutoML and Custom Training Jobs to Create Models\n",
        "\n",
        "Once we have created your dataset and defined your training script, we will create a model."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "aA41rT_mb-rV"
      },
      "outputs": [],
      "source": [
        "custom_job = aiplatform.CustomTrainingJob(\n",
        "    display_name=\"train-housing\",\n",
        "    script_path=\"training_script.py\",\n",
        "    container_uri=\"gcr.io/cloud-aiplatform/training/tf-cpu.2-2:latest\",\n",
        "    requirements=[\n",
        "        \"gcsfs==0.7.1\",\n",
        "    ],\n",
        "    model_serving_container_image_uri=\"gcr.io/cloud-aiplatform/prediction/tf2-cpu.2-2:latest\",\n",
        ")\n",
        "\n",
        "custom_model = custom_job.run(\n",
        "    ds, replica_count=1, model_display_name=\"housing-model\", sync=False\n",
        ")\n",
        "\n",
        "automl_job = aiplatform.AutoMLTabularTrainingJob(\n",
        "    display_name=\"train-housing-automl_1\",\n",
        "    optimization_prediction_type=\"regression\",\n",
        "    optimization_objective=\"minimize-rmse\",\n",
        "    column_transformations=[\n",
        "        {\"numeric\": {\"column_name\": \"longitude\"}},\n",
        "        {\"numeric\": {\"column_name\": \"latitude\"}},\n",
        "        {\"numeric\": {\"column_name\": \"housing_median_age\"}},\n",
        "        {\"numeric\": {\"column_name\": \"total_rooms\"}},\n",
        "        {\"numeric\": {\"column_name\": \"total_bedrooms\"}},\n",
        "        {\"numeric\": {\"column_name\": \"population\"}},\n",
        "        {\"numeric\": {\"column_name\": \"households\"}},\n",
        "        {\"numeric\": {\"column_name\": \"median_income\"}},\n",
        "    ],\n",
        "    optimization_objective_recall_value=None,\n",
        "    optimization_objective_precision_value=None,\n",
        ")\n",
        "\n",
        "# This will take around an hour to run\n",
        "automl_model = automl_job.run(\n",
        "    dataset=ds,\n",
        "    target_column=\"median_house_value\",\n",
        "    training_fraction_split=0.6,\n",
        "    validation_fraction_split=0.2,\n",
        "    test_fraction_split=0.2,\n",
        "    weight_column=None,\n",
        "    budget_milli_node_hours=1000,\n",
        "    model_display_name=\"house-value-prediction-model\",\n",
        "    disable_early_stopping=False,\n",
        "    predefined_split_column_name=None,\n",
        "    sync=False,\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5vhDsMJNqcTW"
      },
      "source": [
        "# Deploy Model"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Y9GH72wWqcTX"
      },
      "outputs": [],
      "source": [
        "custom_endpoint = custom_model.deploy(machine_type=\"n1-standard-4\", sync=False)\n",
        "automl_endpoint = automl_model.deploy(machine_type=\"n1-standard-4\", sync=False)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "nIw1ifPuqcTb"
      },
      "source": [
        "# Predict on Endpoint"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "KDd6VKWDIRgh"
      },
      "outputs": [],
      "source": [
        "custom_endpoint.wait()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "5MssDOgyIPIK"
      },
      "outputs": [],
      "source": [
        "# This sample is taken from an observation where median_house_value = 94600\n",
        "prediction = custom_endpoint.predict(\n",
        "    [\n",
        "        {\n",
        "            \"longitude\": -124.35,\n",
        "            \"latitude\": 40.54,\n",
        "            \"housing_median_age\": 52.0,\n",
        "            \"total_rooms\": 1820.0,\n",
        "            \"total_bedrooms\": 300.0,\n",
        "            \"population\": 806,\n",
        "            \"households\": 270.0,\n",
        "            \"median_income\": 3.014700,\n",
        "        },\n",
        "    ]\n",
        ")\n",
        "prediction"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "vNFA7MmFf_zd"
      },
      "outputs": [],
      "source": [
        "automl_job.state"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "KjVy8tFXgC6w"
      },
      "outputs": [],
      "source": [
        "automl_endpoint.wait()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "3k6-rSZPqcTc"
      },
      "outputs": [],
      "source": [
        "# This sample is taken from an observation where median_house_value = 94600\n",
        "prediction = automl_endpoint.predict(\n",
        "    [\n",
        "        {\n",
        "            \"longitude\": \"-124.35\",\n",
        "            \"latitude\": \"40.54\",\n",
        "            \"housing_median_age\": \"52.0\",\n",
        "            \"total_rooms\": \"1820.0\",\n",
        "            \"total_bedrooms\": \"300.0\",\n",
        "            \"population\": \"806\",\n",
        "            \"households\": \"270.0\",\n",
        "            \"median_income\": \"3.014700\",\n",
        "        },\n",
        "    ]\n",
        ")\n",
        "prediction"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [],
      "name": "AI_Platform_(Unified)_SDK_Custom_Model_Training_AutoML_Tabular_Model_Training.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
